[SPARK-32100][CORE][TESTS][FOLLOWUP] Reduce the required test resources#29001
[SPARK-32100][CORE][TESTS][FOLLOWUP] Reduce the required test resources#29001dongjoon-hyun wants to merge 2 commits intoapache:masterfrom dongjoon-hyun:SPARK-32100-2
Conversation
|
Test build #124962 has finished for PR 29001 at commit
|
|
Retest this please. |
|
Test build #124965 has finished for PR 29001 at commit
|
|
Retest this please. |
|
Test build #124976 has finished for PR 29001 at commit
|
|
Retest this please |
|
Retest this please. |
|
Test build #124980 has finished for PR 29001 at commit
|
|
retest this please |
|
Test build #124996 has finished for PR 29001 at commit
|
|
Thank you, @HyukjinKwon ! |
|
Retest this please. |
|
Test build #124998 has finished for PR 29001 at commit
|
| private val conf = new org.apache.spark.SparkConf() | ||
| .setAppName(getClass.getName) | ||
| .set(SPARK_MASTER, "local-cluster[20,1,512]") | ||
| .set(SPARK_MASTER, "local-cluster[10,1,512]") |
There was a problem hiding this comment.
Any particular reason we need 10 or 20 executors? It's still too many compares to other tests, whose average number should be 2 or 3. cc @holdenk
There was a problem hiding this comment.
So I think this test came from the situation where we were experiencing a deadlock and we wanted to make sure we re-created the potential deadlock which happened when we decommissioned most of the executors. Now this deadlock never made it into OSS Spark, but having the test here to catch it just incase is good. I think we could catch the same deadlock with 5 executors and decommissioning 4 of them, but @dongjoon-hyun is the one who found this potential issue so I'll let him clarify :)
|
LGTM, thanks for working around the crowded Jenkins env :) I think we can explore if we want to reduce it further in a follow on, but I think removing causes of test flakiness sooner rather than later is better for everyone working on Spark. |
### What changes were proposed in this pull request? This PR aims to disable dependency tests(test-dependencies.sh) from Jenkins. ### Why are the changes needed? - First of all, GitHub Action provides the same test capability already and stabler. - Second, currently, `test-dependencies.sh` fails very frequently in AmpLab Jenkins environment. For example, in the following irrelevant PR, it fails 5 times during 6 hours. - #29001 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins without `test-dependencies.sh` invocation. Closes #29004 from dongjoon-hyun/SPARK-32178. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
|
+1, LGTM |
|
This is a test-only PR and I verified this manually. |
|
Merged to master. |
|
Test build #125010 has finished for PR 29001 at commit
|
This PR aims to reduce the required test resources in WorkerDecommissionExtendedSuite. When Jenkins farms is crowded, the following failure happens currently [here](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2-hive-2.3/890/testReport/junit/org.apache.spark.scheduler/WorkerDecommissionExtendedSuite/Worker_decommission_and_executor_idle_timeout/) ``` java.util.concurrent.TimeoutException: Can't find 20 executors before 60000 milliseconds elapsed at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:326) at org.apache.spark.scheduler.WorkerDecommissionExtendedSuite.$anonfun$new$2(WorkerDecommissionExtendedSuite.scala:45) ``` No. Pass the Jenkins. Closes apache#29001 from dongjoon-hyun/SPARK-32100-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR aims to reduce the required test resources in WorkerDecommissionExtendedSuite.
Why are the changes needed?
When Jenkins farms is crowded, the following failure happens currently here
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass the Jenkins.